Skip to content

Fix fetch cache key collisions for Request and FormData bodies#332

Merged
james-elicx merged 7 commits intocloudflare:mainfrom
JaredStowell:jstowell/fix-cache-key-collisions
Mar 8, 2026
Merged

Fix fetch cache key collisions for Request and FormData bodies#332
james-elicx merged 7 commits intocloudflare:mainfrom
JaredStowell:jstowell/fix-cache-key-collisions

Conversation

@JaredStowell
Copy link
Contributor

Summary

Fix fetch cache key generation so cached requests are keyed by the actual effective request body, including when the body is provided on a Request object.

This also fixes ambiguous FormData serialization that could cause distinct payloads to collapse into the same cache entry.

Problem

Vinext’s fetch cache key generation did not fully account for the request body in all supported call shapes.

Request bodies were ignored

The cache key logic correctly merged headers from both input and init, but it only serialized init.body. If the body lived on a Request object, vinext effectively treated the request as body-less for cache-key purposes.

Example:

await fetch(
  new Request("https://api.example.com/search", {
    method: "POST",
    body: JSON.stringify({ query: "alpha" }),
    headers: { "content-type": "application/json" },
  }),
  { next: { revalidate: 60 } },
)

await fetch(
  new Request("https://api.example.com/search", {
    method: "POST",
    body: JSON.stringify({ query: "bravo" }),
    headers: { "content-type": "application/json" },
  }),
  { next: { revalidate: 60 } },
)

Before this change, both requests could produce the same cache key even though the payloads were different.

That makes cached POST-style fetches unsafe: a response generated for one payload can be reused for another payload.

Concrete failure modes include:

  • a search endpoint returning results for the wrong query
  • a filtered API response being reused for the wrong filter set
  • application data keyed by request body being served from the wrong cached entry

Multi-value FormData could collide

FormData values were serialized by joining them with commas, which is ambiguous.

Example:

const formA = new FormData()
formA.append("name", "a,b")
formA.append("name", "c")
// serialized as: "name=a,b,c"

const formB = new FormData()
formB.append("name", "a")
formB.append("name", "b,c")
// also serialized as: "name=a,b,c"

These are different payloads, but they produced the same cache-key fragment.

Root Cause

The issue was in the fetch cache key builder, not in cache storage itself.

  • collectHeaders() already handled Request inputs correctly
  • buildFetchCacheKey() used serializeBody(init)
  • serializeBody() only looked at init.body
  • Request bodies were therefore omitted unless duplicated in init
  • FormData entries were flattened with comma-joining, which is not injective

In other words, the cache key did not always represent the true effective request.

What Changed

Fetch cache keying

  • Include Request object bodies in cache key generation
  • Support body extraction from Request inputs without mutating the original fetch behavior
  • Preserve the original request body for the underlying network fetch

FormData serialization

  • Replace ambiguous comma-joined serialization with structured serialization
  • Serialize per-key value lists in a format that preserves boundaries and ordering semantics
  • Keep existing oversized-body protections in place

Tests

Add regression coverage for:

  • different Request bodies producing distinct cache entries
  • identical Request bodies reusing the same cache entry
  • ambiguous comma-containing multi-value FormData payloads not colliding
  • Request bodies still being forwarded intact after cache-key generation

Examples

Example 1: POST search requests

Before:

await fetch(
  new Request("https://api.example.com/search", {
    method: "POST",
    body: JSON.stringify({ query: "alpha" }),
    headers: { "content-type": "application/json" },
  }),
  { next: { revalidate: 60 } },
)

await fetch(
  new Request("https://api.example.com/search", {
    method: "POST",
    body: JSON.stringify({ query: "bravo" }),
    headers: { "content-type": "application/json" },
  }),
  { next: { revalidate: 60 } },
)

These could collide and reuse the wrong cached response.

After:

  • the request body is included in the cache key
  • each distinct payload gets its own cache entry

Example 2: Multi-value form submissions

Before:

const formA = new FormData()
formA.append("name", "a,b")
formA.append("name", "c")

const formB = new FormData()
formB.append("name", "a")
formB.append("name", "b,c")

These serialized to the same cache-key fragment.

After:

  • FormData values are serialized in a structured format
  • these payloads no longer collide

Why This Approach

This change fixes the root cause while keeping the existing caching model intact.

The principle is simple:

  • semantically different requests must not share a cache entry
  • semantically identical requests should still reuse the same cache entry

The patch stays within the current fetch cache architecture and only changes key generation and regression coverage.

Files Changed

  • packages/vinext/src/shims/fetch-cache.ts
  • tests/fetch-cache.test.ts

Test Plan

Ran targeted regression coverage for the affected surface:

pnpm dlx vitest run tests/fetch-cache.test.ts

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13731560dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@james-elicx
Copy link
Collaborator

/bigbonk review

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 7, 2026

Open in StackBlitz

npm i https://pkg.pr.new/vinext@332

commit: fa470c1

Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The problem this PR addresses is real and the fix direction is correct: Request object bodies were silently ignored in cache key generation, and the old FormData serialization was ambiguous. Good test coverage for the new cases.

I found a few issues — one correctness concern, one memory safety gap, and two housekeeping items.


Issues Found

1. (Bug) Confusing double-clone and silent fallthrough for FormData parsing failure — see inline comments on lines 213 and 224.

2. (Memory) Full body buffered before size check for Request bodies — same as Codex flagged. See inline comment on line 224.

3. (Housekeeping) Cache key version not bumped

The FormData serialization format changed from key=val1,val2 to JSON.stringify([key, [{kind,value},...]]). The CACHE_KEY_PREFIX on line 40 should be bumped to "v3" to make the format boundary explicit and prevent ambiguity during rollbacks or gradual deployments. Not strictly required for correctness (old entries won't match the new format), but it's good hygiene.

4. (Minor) FormData.keys() iteration order

The extracted serializeFormData helper (line 95) iterates FormData keys via new Set(formData.keys()). Iteration order is insertion-order in all major engines but not spec-mandated. Two FormData objects with the same entries appended in different orders will produce different cache keys. This is arguably correct (different wire order = different payload), just worth being aware of. The new Set() dedup combined with getAll() is the right approach.

5. (Test gap) Already-consumed Request body

There's no test for the case where a Request body has already been consumed (locked/disturbed stream). Currently input.clone() inside serializeBody would throw. It might be worth adding a test documenting this behavior — either it throws and the caller skips caching, or it should be handled gracefully.

const call = fetchMock.mock.calls[0];
const forwardedRequest = call[0] as Request;
expect(forwardedRequest).toBeInstanceOf(Request);
expect(await forwardedRequest.text()).toBe("request-body-content");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test. One edge case worth considering for future coverage: what happens when the Request body has already been consumed (i.e., input.body is a locked/disturbed stream)? Currently input.clone() inside serializeBody would throw. A test documenting whether that throws or gracefully skips caching would be helpful.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 7, 2026

Review posted successfully. Here's a summary of the feedback:

Key findings:

  1. Redundant clone + silent fallthrough (bug): Line 213 creates input.clone() when requestClone already exists. More importantly, if .formData() parsing fails, the error is swallowed and execution falls through to raw arrayBuffer() keying — which produces unstable cache keys for multipart bodies (boundary strings differ between identical logical payloads). Suggested either using requestClone and not falling through, or skipping caching entirely on parse failure.

  2. Memory: full body buffered before size check (same issue Codex flagged): Line 224 reads the entire body via arrayBuffer() before checking the 1 MiB limit. Suggested adding a Content-Length pre-check as a fast-path rejection, and ideally streaming the body with incremental size guards for full parity with the ReadableStream path.

  3. Cache key version not bumped: The FormData serialization format changed substantially but CACHE_KEY_PREFIX is still "v2". Should be bumped to "v3".

  4. FormData.keys() iteration order: Noted as an acceptable design choice — insertion order in practice, not spec-mandated.

  5. Missing test for already-consumed Request body: input.clone() would throw on a disturbed stream. Worth documenting this behavior with a test.

github run

@southpolesteve
Copy link
Collaborator

/bigbonk re-review please

Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR fixes two real correctness bugs in fetch cache key generation:

  1. Request object bodies were silently ignored in cache key computation
  2. FormData comma-joined serialization was ambiguous/non-injective

The approach is sound — the fix stays within the existing architecture and only changes key generation. Good test coverage (16 new tests covering the matrix of Request bodies, FormData collisions, multipart boundary normalization, malformed form bodies, and body passthrough). All 71 tests pass.

I see the previous review from ask-bonk already flagged several items, and I can confirm that some have been addressed in subsequent commits (cache key version bump to v3, Content-Length pre-check, SkipCacheKeyGenerationError for malformed bodies). A few remaining observations below.

Overall this is solid work. The issues below are minor-to-medium.

const boundedRequest = new Request(input.url, {
method: input.method,
headers: contentType ? { "content-type": contentType } : undefined,
body: new Blob(chunks.map((chunk) => chunk.slice())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary .slice() on chunks.

The chunks array was just read from a cloned request's stream in readRequestBodyChunksWithinLimit. These Uint8Array buffers are already owned by this function — nobody else holds a reference to them. The .slice() here creates a defensive copy of each chunk, but it's not needed since the buffers were freshly read from requestClone.body (which itself was cloned from input). The original input request body is untouched.

This is a minor allocation concern, not a bug.

Suggested change
body: new Blob(chunks.map((chunk) => chunk.slice())),
body: new Blob(chunks),

pushBodyChunk(JSON.stringify([key, { kind: "string", value: val }]));
continue;
}
if (val.size > MAX_CACHE_KEY_BODY_BYTES || getTotalBodyBytes() + val.size > MAX_CACHE_KEY_BODY_BYTES) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File size check doesn't account for JSON serialization overhead.

The size check uses val.size (raw file bytes), but pushBodyChunk then calls JSON.stringify(...) which adds the key name, {kind: "file", name: ..., type: ..., value: ...} wrapper, and JSON escaping overhead. For a file right at the size limit, the serialized JSON string will be larger than val.size, potentially exceeding the limit without being caught by this pre-check.

In practice this is unlikely to matter (the pushBodyChunk function itself also checks totalBodyBytes against the limit), but the pre-check here is misleadingly precise. Not blocking — just noting the discrepancy.

}
}

for (const chunk of chunks) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-form Request bodies don't go through pushBodyChunk for size checking.

When a Request body has a non-form content type, the chunks are decoded and pushed directly into bodyChunks (line 315) without going through pushBodyChunk, which means totalBodyBytes is not updated and the per-chunk size limit in pushBodyChunk is never hit. Only the final trailing chunk on line 319 goes through pushBodyChunk.

This is partially mitigated by the streaming size check in readRequestBodyChunksWithinLimit (which enforces the 1 MiB limit during reading), so this won't cause an actual over-limit key. But it means totalBodyBytes is inaccurate for the rest of the function, which could matter if future code depends on it.

Suggested change
for (const chunk of chunks) {
for (const chunk of chunks) {
pushBodyChunk(decoder.decode(chunk, { stream: true }));
}

});
const formData = await boundedRequest.formData();
await serializeFormData(formData, pushBodyChunk, getTotalBodyBytes);
canonicalizedContentType = formContentType === "multipart/form-data" && contentType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boundary stripping only applies to multipart/form-data, not application/x-www-form-urlencoded.

This is correct behavior (urlencoded doesn't have boundaries), but the conditional reads a bit awkwardly. Consider adding a brief comment explaining the asymmetry, since a reader might wonder why only multipart gets canonicalizedContentType treatment.

Also: when formContentType is "application/x-www-form-urlencoded", canonicalizedContentType stays undefined, so the original content-type header (including any charset=... params) remains in the cache key via collectHeaders. The test on line 525 ("urlencoded Request bodies with different charset headers get separate cache entries") confirms this works correctly. Good.

const call = fetchMock.mock.calls[0];
const forwardedRequest = call[0] as Request;
expect(forwardedRequest).toBeInstanceOf(Request);
expect(await forwardedRequest.text()).toBe("request-body-content");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test, but it's testing body passthrough for cache misses only.

This verifies that input (the Request) still has a consumable body after cache key generation on a cache miss. Worth noting: on a cache hit, the input Request is never forwarded to originalFetch, so the body state doesn't matter. And on the stale-while-revalidate path (line 558 in fetch-cache.ts), input is forwarded to originalFetch for the background refetch — so the body also needs to be intact there. This test implicitly covers that since it's the same input reference, but an explicit test for the SWR path with a Request body would be stronger.

Not blocking — the current coverage is adequate for the fix.

expect(fetchMock).toHaveBeenCalledTimes(1);
});

it("malformed multipart Request bodies bypass cache instead of hashing raw bytes", async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test for malformed multipart fallback behavior.

This confirms that malformed multipart bodies (where the boundary in the header doesn't match the body) bypass cache entirely (via SkipCacheKeyGenerationError) rather than falling through to raw-byte keying. This directly addresses the concern from the previous review about unstable cache keys from boundary-dependent raw bytes.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 7, 2026

Review submitted successfully. Here's a summary of my feedback on PR #332:

Overall assessment: The PR fixes two real correctness bugs and the approach is sound. All 71 tests pass. The previous review's main concerns (cache key version bump, Content-Length pre-check, malformed body fallthrough) have been addressed in subsequent commits.

Remaining observations:

  1. Unnecessary .slice() on chunks (line 298) — The Uint8Array buffers in chunks are already freshly read from a cloned request body. The defensive copy via .slice() is not needed and wastes allocations.

  2. Non-form Request bodies bypass pushBodyChunk size tracking (lines 314-315) — When a Request has a non-form content type, chunks are pushed directly to bodyChunks without updating totalBodyBytes. This is mitigated by the streaming size check in readRequestBodyChunksWithinLimit, but leaves totalBodyBytes inaccurate. Suggested using pushBodyChunk instead.

  3. File size pre-check doesn't account for JSON serialization overhead (line 106) — Minor discrepancy: the pre-check uses raw val.size but the actual serialized JSON is larger. Not a real bug since pushBodyChunk also enforces the limit.

  4. Positive notes: The malformed multipart test (line 501) correctly validates that broken boundaries skip cache entirely rather than falling through to raw-byte keying. The boundary-stripping logic for multipart vs urlencoded is handled correctly, confirmed by the charset test.

github run

@james-elicx
Copy link
Collaborator

/bigbonk re-review this pr please

Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR fixes two real correctness bugs in fetch cache key generation:

  1. Request object bodies were silently ignored, causing distinct POST payloads to share cache entries
  2. FormData comma-joined serialization was ambiguous/non-injective

The approach is well-structured: body extraction uses request.clone() to preserve the original for forwarding, Content-Length pre-check avoids unnecessary cloning for oversized bodies, malformed multipart bodies correctly skip caching via SkipCacheKeyGenerationError instead of falling through to raw-byte keying, and the cache key version was bumped to v3.

The test coverage is thorough — 16 new tests covering Request bodies (distinct, identical, FormData with comma collisions, multipart boundary normalization, malformed multipart, charset handling), init FormData improvements (comma collisions, entry order, file metadata), body passthrough verification, already-consumed body handling, oversized Content-Length fast-path, and SWR body preservation.

All 73 tests pass. The previous two review rounds' major concerns have been addressed. A few minor observations below — none are blocking.

const boundedRequest = new Request(input.url, {
method: input.method,
headers: contentType ? { "content-type": contentType } : undefined,
body: new Blob(chunks as unknown as BlobPart[]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the as unknown as BlobPart[] cast is unnecessary. Uint8Array[] is already assignable to BlobPart[] since Uint8Array extends ArrayBufferView which is a valid BlobPart. This compiles fine without the cast:

Suggested change
body: new Blob(chunks as unknown as BlobPart[]),
body: new Blob(chunks),

Comment on lines +314 to +315
for (const chunk of chunks) {
pushBodyChunk(decoder.decode(chunk, { stream: true }));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: when the Request body has a non-form content type, chunks are pushed directly via pushBodyChunk here (good — the prior review's concern about bypassing pushBodyChunk appears to have been addressed). However, totalBodyBytes is incremented inside pushBodyChunk based on encoder.encode(chunk).byteLength, which re-encodes the already-decoded string. Since readRequestBodyChunksWithinLimit already enforced the 1 MiB limit on the raw bytes, this double-encoding is safe but slightly wasteful for large bodies. Not a real issue — just noting for awareness.

pushBodyChunk(JSON.stringify([key, { kind: "string", value: val }]));
continue;
}
if (val.size > MAX_CACHE_KEY_BODY_BYTES || getTotalBodyBytes() + val.size > MAX_CACHE_KEY_BODY_BYTES) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: the val.size pre-check here measures raw file bytes, but the actual data pushed to pushBodyChunk is a JSON string that includes the key name, {kind, name, type, value} wrapper, and JSON escaping. For files near the limit, the serialized JSON will be somewhat larger than val.size. This is effectively a non-issue because pushBodyChunk also enforces the limit on the serialized output, so worst case a slightly-over-limit file gets caught one call later. Just noting the asymmetry.

Comment on lines +291 to +302
const formContentType = getParsedFormContentType(contentType);

if (formContentType) {
try {
const boundedRequest = new Request(input.url, {
method: input.method,
headers: contentType ? { "content-type": contentType } : undefined,
body: new Blob(chunks as unknown as BlobPart[]),
});
const formData = await boundedRequest.formData();
await serializeFormData(formData, pushBodyChunk, getTotalBodyBytes);
canonicalizedContentType = formContentType === "multipart/form-data" && contentType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a brief comment explaining the asymmetry: canonicalizedContentType is only set for multipart/form-data (to strip the boundary) but not for application/x-www-form-urlencoded (which has no boundary to strip). The charset and other params for urlencoded are preserved in the cache key via collectHeaders. The test at line 567 confirms this is correct, but a reader unfamiliar with the code might wonder why only multipart gets special treatment.

Non-blocking — just a readability suggestion.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 8, 2026

Review submitted. Here's a summary of my assessment:

Verdict: Approved

This PR fixes two real correctness bugs:

  1. Request object bodies were silently ignored in cache key generation, causing distinct POST payloads to collide
  2. FormData comma-joined serialization was ambiguous (non-injective), allowing different payloads to produce identical cache keys

The implementation is solid:

  • Uses request.clone() to read the body for keying while preserving the original for forwarding
  • Content-Length pre-check avoids cloning oversized bodies
  • Malformed multipart bodies correctly skip caching via SkipCacheKeyGenerationError rather than falling through to unstable raw-byte keying
  • Cache key version bumped from v2 to v3
  • All 73 tests pass, including 16 new regression tests

The previous two review rounds' major concerns (redundant clone, silent fallthrough, memory buffering, cache version bump) have all been addressed in subsequent commits. I left 4 minor non-blocking comments: an unnecessary type cast, a note on double-encoding in the non-form Request path, a note on the file size pre-check asymmetry, and a readability suggestion for the multipart-only canonicalization logic.

github run

@james-elicx james-elicx merged commit 397a8dc into cloudflare:main Mar 8, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants